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1 . Introduction 

Many  of  the  classical  statistical  procedures  which  are  superior  to  their 
competitors  under  the  assumed  model  have  one  drawback,  namely,  that  their 
behavior  is  seriously  affected  if  a few  gross  errors  are  present  in  the  sample. 

For  example,  consider  the  problem  of  estimation  of  the  mean  e of  a univariate 
normal  population.  It  is  well  known  that  the  sample  mean  is  uniformly  minimum 
variance  unbiased  estimate  of  e,  but  it  is  not  a very  good  estimate  if  there  are 
gross  errors  in  the  sample.  Hodges  and  Lehmann  (1963)  have  proposed  a class  of 
estimates  for  the  location  parameter  based  on  rank  test  statistics;  the  estimates 
belonging  to  this  class  are  approximately  normally  distributed,  provided  the  sample 
size  is  sufficiently  large.  Gupta  and  Huang  (1974)  have  investigated  selection 
procedures  based  on  one-sample  Hodges -Lehmann  estimates  of  location  for  the 
problem  of  selecting  a subset  containing  the  largest  t (1  £ t < k)  location 
parameters  from  k (k  > 2)  given  populations,  assuming  the  sample  size  is  large. 

An  important  member  of  the  class  of  Hodges -Lehmann  estimates  is  the  sample 
median.  Apart  from  having  a simple  analytic  form  for  its  distribution,  the 
sample  median  as  an  estimate  of  location  has  some  other  properties.  Intuitively, 
a reasonable  estimate  of  location  should  have  a distribution  which,  in  some  sense, 
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Is  centered  on  the  true  location  parameter  value.  It  Is  shown  by  Hodges  and 
Lehmann  (1963)  that  the  sample  median  has  a distribution  which  Is  symmetric  about 
the  true  parameter  value  If  the  underlying  distribution  Is  symnetrlc*  and  In  case 
the  underlying  distribution  Is  not  symmetric,  the  sample  median  Is  a median 
unbiased  estimate  of  location,  I.e.,  the  median  of  Its  distribution  coincides 
with  the  true  location  parameter.  In  this  paper  we  Investigate  a procedure 
based  on  sample  medians  for  selection  of  the  largest  location  parameter  of 
k 2)  populations. 

In  Section  2.0  some  notations  used  In  this  paper  are  Introduced.  In 
Sections  3 and  4 the  problem  of  selecting  a subset  containing  the  largest  of 
k 2)  location  parameters  Is  considered,  and  a selection  rule  based  on  sample 
medians  is  proposed  and  Investigated. 

Section  5 consists  of  Investigation  of  selection  rules,  which  are  slight 
modifications  of  the  rule  proposed  In  Section  3,  for  the  normal  means.  In 
Section  5.3  the  proposed  rule  based  on  sample  medians  Is  compared  to  Gupta's 
rule  based  on  sample  means  [see  Gupta  (1965)],  when  the  normal  means  are 
equally  spaced.  It  appears  from  the  numerical  computations  that,  as  expected, 

Gupta's  rule  Is  superior.  In  Section  5.5  we  define  and  compute  the  asymptotic 
relative  efficiency  (ARE)  of  rules  based  on  sample  medians  relative  to  rules 
based  on  sample  means.  For  the  normal  case  the  medians  procedure  is  Inferior 
to  the  means  procedure,  the  ARE  being  2/n.  For  the  contaminated  normal  population, 
however,  the  medians  procedure  fares  better  than  It  does  In  the  normal  case,  as 
the  ARE  is  found  to  be  an  increasing  function  of  the  variance  of  the  contaminating 
normal  pop"''at1ons.  In  Section  5.6  a test  of  homogeneity  based  on  sample  medians 
Is  propo?  a relation  between  the  test  and  the  selection  rule  of  Section 

5.1  Is  estab  .ed.  Section  5.7  deals  with  the  distribution  of  a statistics 
useful  In  some  selection  and  ranking  problems,  and  Its  percentage  points  are  computed. 


3 


As  mentioned  above,  the  means  procedure  Is  better  than  the  medians  procedure 

if  the  underlying  distributions  are  normal.  This  may  be  due  to  the  fact  that 

the  normal  density  has  short  tails,  and  hence  the  probability  of  getting  extreme 

observations  Is  very  small.  In  case  the  underlying  distributions  have  longer 

tails,  for  example,  logistic  and  double  exponential,  extreme  observations  are 

more  frequent  and  they  have  a serious  effect  on  the  sample  means,  but  not  on 

the  sample  medians.  In  these  situations  the  medians  procedure  should  perform 

better  than  in  the  normal  case.  This  heuristic  argument  is  strengthened  by 

2 

the  fact  that,  for  logistic  populations,  the  ARE  Is  it  /12,  and  for  double 
exponential  populations,  the  ARE  is  2.  This  Is  the  subject  of  Section  6. 


2.0  Preliminaries  and  Notations 

Let  • »X2nH-l  (2m+1)  Independent  observations  from  a 

population  with  cumulative  distribution  function  (cdf)  F(x,e)  and  probability 
density  function  (pdf)  f(x,0),  x,0  e IR,  the  real  line.  Then  the  sample 
median  X Is  given  by 

^ = ^[m.1] 


where  ^[2ni+l]  ordered  X^. 

The  pdf  of  X Is 


g(x,9)  = iMiU  [F(x,0)f[l-F(x,0)ff(x,0) 
(m!)^ 


(2.0.1) 


and  the  corresponding  cdf  Is 


O(x.e)  - c„  / [F(ii,e)]"'[1-F(u.e)]'"f(u.e)du 


□ 


RV 


rRIli,'?";: i".,  (;i  jy 

C'U 


A 


(2.0.2) 


where  I^(p,q)  Is  the  Incomplete  beta  function: 

y 

“ / uP"^(l-u)‘’"'du/B(p,q),  0 < y < 1,  p.q  > 0. 

y 0 

The  following  result  from  Karlin  (1968)  will  be  used  in  later  sections: 


(2.0.3) 


Lemma  2.0.1 : If  f(x,e)  has  monotone  likelihood  ratio  (MLR)  in  x and  0,  then 
g(x,e)  given  by  (2.0.1)  has  MLR  in  x and  0. 

Note,  the  above  result  is  true  for  the  distribution  of  any  order  statistic. 

3.0  On  Procedures  Based  on  Sample  Medians  for  Selection  of  the  Largest  Location 
Parameter. 

Let  TT^,...,ir|^  be  k independent  populations  with  cdf's  F(x-0^),...,F(x-0|^), 
respectively.  Let  ® sample  of  size  n = 2m+l  (m  > 1)  from 

i = 1 k.  Then  the  pdf  g(x-e^)  and  the  cdf  G(x-0^)  of  tne  sample  median  x^ 

from  can  be  obtained  from  (2.0.1)  and  (2.0.2)  by  substituting  f(x,0^)  ■ f(x-0^) 
and  F(x,0^)  = F(x-0^). 

For  selecting  a subset  containing  the  population  associated  with  the 
largest  location  parameter  so  that  the  probability  of  selecting  in  the 

subset  is  at  least  a preassigned  constant  P*  (1/k  l P*  i 1).  we  consider  the 
following  procedure: 

R:  select  iff  (3.0.1) 

where  d 0 is  chosen  to  satisfy  the  basic  P*-condition . 

Let  be  the  (unknown)  sample  median  which  corresponds  to  the  i-th 
ordered  parameter  0^^j  (i  = l,...,k).  Then 


P(CS|R)  = P(X(^j  > Xj-^j-d) 
k-1 


“ ^ h.  i (m+1  ,m+l)][F(u)]'"[l-F(u)f 


(m!)‘ 


>f(u)du  (3.0.2) 
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It  is  clear  from  the  expression  (3.0.2)  that  the  infiimim  of  the  probability 
of  a correct  selection  occurs  when  all  the  location  pirameters  are  equal,  and 
hence  the  constant  d is  given  by 

/ [G(u+d)]'^'^g(u)du  » P*  (3.0.3) 

«oo 

3.1  Expected  Size  of  the  Selected  Subset 

The  size  of  the  subset  selected  by  the  rule  R is  a random  variable  which 
takes  values  in  the  set  {l,...,k}.  It  is  desirable  that  the  size  of  the  selected 
subset  be  small,  and  also  the  ranks  of  the  selected  populations  be  large,  where 
the  population  associated  with  is  given  rank  i (i  = l,...,k).  The  expected 

size  of  the  selected  subset  and  expected  sum  of  ranks  of  selected  populations 
have  been  proposed  as  criteria  of  efficiency  of  selection  rules  [see  for  example, 
Gupta  (1965)]. 

Let  S and  S.  be  random  variables  denoting  the  size  of  the  selected  subset 
r 

and  the  sum  of  ranks  of  the  selected  populations,  respectively.  Then 

k 

Eg(SlR)  = Pg(i|R)  (3.1.4) 

k 

and  EJS JR)  = I i Pji|R)  (3.1.5) 

§ r i=l  § 


where  PQ(i|R)  is  the  probability  with  which  the  rule  R selects  the  population 
associated  with  i = l,2,...,k,  and  is  given  by 


j/1 


’[j] 


(m+1 ,m+l )] 


•[F(u)f[l-F(u)ff(u)du 


(3.1.6) 
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3.2.  Some  Properties  of  the  Rule  R 

(i)  Upper  bound  on  Eg(S|R) 

We  assume  that  f(x-e)  has  MLR  In  x and  a.  Then  It  follows 
from  Lemma  2.0.1  that  g(x-e)  has  MLR  in  x and  e.  Hence  from 
Theorem  1 of  Gupta  (1965),  we  have 

Max  EJS|R)  - Max  EJS|R)  = kP* 

eeB  - eesQ  - 

where  Bq  = {(e^ ,. . . ,6,^)  € e:  0^  •=...»  0^1. 

(ii)  Monotonicity 

It  is  clear  from  the  expression  (3.1.6)  that  the  rule  R is 
strongly  monotone  [see  Santner  (1975)],  i.e. 


PgdlR)  is 


f in  0r,-|  if  remaining  component, 

of  0 are  kept  fixed 

+ in  0r.'-|f  j^i  if  remaining  component, 
of  0 are  kept  fixed 


The  following  two  properties  of  R are  immediate  consequences  of  strong 
mono tonicity: 


(a)  R is  monotone:  PQ(i|R)  i Pg(j|R).  1 <2  < i <.  k 

(b)  R is  unbiased:  Pg(k|R)  > Pg(i|R),  1 £ i £ k 

(iii)  Minimaxity  with  Respect  to  the  Expected  Subset  Size  Among  Rules 
Based  on  Medians 

A selection  rule  R*  is  said  to  be  minimax  with  respect  to  S if 


sup  E.(S|R*)  = inf  sup  EjS|R') 
e - R’  e 2 


i 


where  the  infimum  is  taken  over  all  selection  rules  R'  which  satisfy  the 
p*-condition. 

The  pdf  g{x,e)  is  clearly  of  location  type  and  it  has  MLR  in  x and  e.  It 
follows  from  Theorem  1.4.2  of  Berger  (1977)  that  the  selection  rule  R is  minimax 
with  respect  to  S among  all  rules  based  on  sample  medians. 

4.0  Selection  of  a Subset  Containing  all  Location  Parameter  Populations  Better 

Than  a Control 

Suppose  we  have  k+1  independent  populations  with  densities 

f(x,0g),  f(x,e^),... ,f(x,e|^),  respectively.  The  population  Wg  is  a standard 
or  control  population.  The  population  is  said  to  be  better  than  the  control 
population  wg  if  0^  >_  Og.  We  are  interested  in  the  subset  of  populations  which 
are  better  than  control.  Gupta  and  Sobel  (1958)  have  considered  the  above  problem 
for  several  distributions  and  have  investigated  rules  based  on  sufficient  statistics 
which  select  a subset  such  that  all  populations  better  than  the  control  are 
included  in  the  subset  with  probability  at  least  P*,  where  P*  (0  £ P*  ^ 1)  is 
a preassigned  constant.  We  will  consider  the  case  of  location  parameter  populations, 
when  f(x,e^)  = f(x-0^),  i = 0,1,..., k,  and  investigate  a rule  based  on  sample 
medians.  The  parameter  Og  may  or  may  not  be  known.  We  consider  these  two  cases 
separately: 

( a ) 9g  known : 

Here  we  are  given  sample  medians  X^.  of  n = 2m+l  independent  observations  from 
Ti.  (i  = l,2,...,k).  Consider  the  rule  R.  defined  as  follows: 

1 d 

R^:  Select  iff  X.  0g-a  (4.0.1) 

where  a is  the  smallest  value  satisfying  the  P*-condition. 

Let  k^  denote  the  number  of  populations  that  are  bette*'  than  (as  good  as) 
the  control,  and  k2  denote  the  number  of  populations  for  which  0^  < 0g.  Then 


I 


= k.  Also  let  primes  (')  refer  to  the  k^  populations  better  than 


control . Then  we  have 


P(CS|R  ) » n P(x:  > e.-a) 
a 1 - u 


Clearly 


inf  P(CSlR^)  = [Ii.p(.a)(nH-l,m+l)]  ' 


(4.0.2) 


In  case  k.  Is  not  known,  a lower  bound  for  P(CS|R.)  can  be  obtained  by 

I O 

substituting  k-j  = k is  (4.2),  and  thus  a conservative  value  of  the  constant 
a = a(P*,m,k)  can  be  obtained  from  the  following  equation: 


M-F(-a)^"^^’"^^^  = (P*) 


*vl/k 


(4.0.3) 


Expected  Size  of  the  Selected  Subset. 


The  size  of  the  subset  selected  by  the  rule  R is  a random  variable 

a 

which  takes  values  in  the  set  {0,1, ...,k}.  Letting  S denote  the  size  of  a 
subset  selected  by  a selection  rule,  the  expected  value  of  S can  be  looked 
upon  as  a measure  of  the  performance  of  the  rule  [see  Gupta  (1965)].  Now 


|Pa)  = I selected} 

a i=l  ’ 


' T,  "<*1  i '’o'®’ 


(4.0.4) 


Let  ^ 0^2]  ®[)(]  ordered  parameters  and  let 

ie  probability  that  t 
®[1]’  ’ ~ 1 Then 


be  the  probability  that  the  rule  R,  selects  the  population  associated  with 

O 


4.1. 


k 

E(S|RJ  = PLi](Ra)  (^-0-5) 

Probability  that  the  Selected  Subset  Contains  Only  the  Populations  Better 
Than  the  Control . 


Assuitk  that  of  the  given  k populations  have  parameter  0+6  and  the 
remaining  populations  have  parameter  o,  where  0,6  and  e^.  are  such  that 
0+6  ^ 0Q  > 6.  In  this  situation  It  Is  meaningful  to  ask  ^or  the  probability 
that  the  rule  select,  exactly  k^  populations  [cf.  Gupta  (1965)].  We  will 
consider  the  special  case  k^  = 1.  Letting  P^(R3)  denote  the  probability  of 
selecting  exactly  one  population,  we  have 


k 

* n - ■F(e„-e-a)<'”'''^'>l  t'F(6„-e-«-a)''^'’'^'>lC‘F(eo-e-a><"*' •”*'>’ 

(4.1.1) 

In  this  case,  the  value  of  a is  to  be  obtained  from  the  equation 


Il.p(.a)(m+l,m+1)  = P*  (4.1.2) 

(b)  0Q  Unknown: 

In  this  case  (2m+l)  independent  observations  are  taken  from  itq.  Let  Xq  be 
the  median  of  the  sample  from  ttq.  We  consider  the  following  selection  rule: 


R^:  select  iff  ^ XQ-b 


(4.1.3) 


where  the  constant  b is  chosen  to  satisfy  the  basic  P*-condition. 

We  have,  as  in  Case  (a), 

‘‘l 

P(CS|R  ) = i2n!llU/“[  n {l-Ipf^+0  .0. . bit'll -'^l)>CF(u)]'"[l-F(u)]f(u)du 
° (m!)^  -«  i=l  ®i 

1 / [l-Ip(u.5)(m+l.m+l)]''[F(u)f[l-F(u)ff(u)du  (4.1.4) 

(m!)  -OD  ' ^ 

The  constant  b = b(k,m,P*)  is  obtained  by  equating  the  right  hand  side  of 
(4.9)  to  P*. 

The  expected  subset  size  for  the  rule  R^^  is  obtained  as  in  Case  (a). 


Remarks: 


(i)  It  can  be  seen  from  expression  for  P(select  e^)  for  rules  R^  and  Rj^  that, 


in  either  case 


P(select  0.)  > P(select  e.)  if  ^ e.. 


(ii ) If  CO  for  all  i = l,...,k,  and  0q  is  finite,  then  E(S)  k in  each 


In  the  following  sections  we  will  consider  several  specific  densities 
of  location  type  and  investigate,  in  some  detail,  rules  based  on  sample  medians 
for  selection  problems  connected  with  them.  As  pointed  out  earlier,  the 
behavior  of  the  proposed  selection  rule  seems  to  depend  on  the  type  of  tails 
of  the  underlying  distributions.  It  is  known  [see,  for  example,  Hajek  (1969)] 
that  among  normal,  logistic  and  double  exponential  distributions,  the  normal 
distribution  has  the  shortest  or  the  thinnest  tails,  and  then  come  logistic 
and  double  exponential  distributions,  in  that  order.  Subset  selection  procedures 
based  on  sample  medians  for  double  exponential  populations  have  been  Investigated 


by  Gupta  and  Leong  (1976).  McDonald  (1977)  has  Investigated  a medians  procedure 
for  logistic  populations.  We  will  be  mainly  concerned  with  normal  and  double 
exponential  populations,  but  we  will  include  some  results  for  the  logistic 
distribution. 


5.0  Normal  Populations 

Gupta  (1956,  1965)  has  considered  the  problem  of  selecting  a subset 
containing  the  largest  of  several  normal  means,  and  has  investigated  rules 
based  on  sufficient  statistics,  namely  the  sample  means,  assuming  a common 
known  variance.  It  is  well  known  that  the  sample  mean  is  a uniformly  minimum 
variance  unbiased  estimate  of  the  normal  mean,  and  therefore  it  should  provide 
a better  selection  rule  than  the  rule  of  Section  3.  In  the  next  section  we 
study  the  normal  case  in  order  to  get  an  idea  of  how  far  off  the  medians 
procedure  is  from  the  means  procedure. 


5.1.  Normal  Populations  with  Common  Known  Variance:  A Procedure  Based  on 

^mple  Medians  for  Selecting  a Subset  Containing  the  Largest  Normal  Mean. 

Let  Ti^,...,TT|^  be  k(_>  2)  independent  normal  populations  with  means  0^,...,e|^ 

2 

and  a common  known  variance  a . Let  be  the  median  o^  n = 2m+l  (m  ^ 1) 
observations  from  tt.  (i  = l,...,k).  The  pdf  g(x,0.)  and  the  cdf  G(x,0^)  are 
obtained  from  (2.0.1)  and  (2.0.2)  by  the  substitution  f(x,e.l  = ((i(x-0^)  and 
F(x,0^)  = <f(x-0^. ) where  (j>(-)  and  ♦(•)  denote  the  pdf  and  cdf,  respectively,  of 
a standard  normal  distribution. 

For  the  problem  of  selecting  a subset  containing  the  population  associated 
with  the  largest  mean  consider  the  following  procedure 

: Select  iff  X^l^j  - d^o 


(5.1.1) 
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r I 


where  0),  the  smallest  constant  to  satisfy  the  basic  P*-cond1t1on,  Is 
given  by  the  following  equation: 

)(m+l.tn+l)]'''^'"(u)[l-*(u)A(u)du  = P*  (5.1.2) 

(ml)  o — 

5.2.  Some  Properties  of  the  Rule  R-i. 

The  expressions  for  Eq(S|R^),  the  expected  subset  size,  and  Eg(S^|R^),  the 
expected  sum  of  ranks.  In  using  the  rule  R^  can  be  obtained  from  (3.1.4),  (3.1.5) 
and  (3.1.6)  by  substituting  F(u)  = *(0)  and  f(u)  = ♦(u).  Since  the  normal  density 
has  the  MLR  property,  the  rule  R^  has  all  the  properties  mentioned  In  Section 
3.2  for  the  general  rule  R. 


5.3.  Comparison  between  R^  and  Gupta's  Selection  Procedure  Based  on  Sample  Means 
when  the  Normal  Means  are  Equally  Spaced. 

Let  TTp...,Ti|^  be  k Independent  normal  populations  with  means  e,  0+  60,..., 

2 

e + {k-l)6o  and  a conmon  known  variance  o , where  6 > 0 Is  a known  constant. 

Let  (j  = l,...,n)  be  a sample  of  size  n = 2m+l  (m  ^ 1)  from  ir^  (1  = l,...,k), 
and  let  X^. , be  the  median  and  the  sample  mean  of  the  observations  from  ir^. 

For  the  problem  of  selecting  a subset  containing  the  largest  normal  mean, 
namely,  0 + (k-l)6o,  Gupta  (1965)  has  proposed  the  following  rule 


R_:  Select  iff  > Xr,  ■ 

9 ' 1 - [kJ 

where  the  constant  d satisfying  the  P*-condition  is  given  by 

OD 

/ (u+d)4i(u)du  = P* 


(5.3.1) 


(5.3.2) 


It  should  be  observed  that,  unlike  d^ , the  constant  d does  not  depend  on  n. 
We  will  compare  the  rule  R^  defined  by  (5.1.1)  and  (5.1.3)  to  Gupta's  rule  R^. 
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Let  P(i,k,P*,6,n|R)  denote  the  probability  with  which  a rule  R selects 
the  population  associated  with  the  i-th  largest  mean  (i  = l,...,k).  Then 
from  Gupta  (1965)  we  have 


P{1,k,P*,6,nlRg)  = P(i,k,P*,6»^|Rg) 

k 

= / [ n <»(u+d-(j-i)6/n)](<i(u)du.  (5.3.3) 

_<D  j = i 

j7i 

Also,  for  the  rule  R-j , we  have 

P(i,k.P*,6,nlR^)  = /*H(u|i.k,P*.6,n)Ati)[l-'t(u)]%(u)du  (5.3.4) 

(m! ) a -<*> 

where 

k 

H(u|1,k.P*.6.n)  = n 

J I I 

Next,  let  y(k,P*,6,nlR)  and  H'^(k,P*,6,n|R)  denote  the  expected  sum  of 
ranks  and  expected  average  rank  of  the  populations  in  the  selected  subset, 
respectively.  Then 


4'(k,P*,5,n|R) 


I iP(i,k.P*,6,n|R)=k'F,(k.P*,6,n|R). 
i=l 


(5.3.5) 


Tables  of  P(i ,k,P*,6»^|Rg) , and  the  expected  proportion  of  the  populations 
retained  in  the  subset  (=  E(SlRg)/k)  are  available  in  Gupta  (1965).  We  have 
computed  the  values  of  these  functions  for  R^  for  k = 2(1)5, n = 3,5,  6 = 0.5(0. 5)5.0 
and  P*  = .90,  .95.  The  numerical  integration  was  performed  on  a CDC  6500  using 
Gauss-Hermite  quadrature  based  on  twenty  nodes.  These  tables  are  given  at  the 
end  of  this  section.  For  example,  for  P*  = .90,  k = 5,  n = 3,  6 = 1.5/»^  the 
rule  Rg  based  on  sample  means  selects  the  second  best  and  third  best  populations 
with  probabilities  .781  and  .357,  respectively.  The  corresponding  probabilities 


for  the  rule  are  .822  and  .467,  In  that  order.  The  probability  of  a correct 
selection  (selecting  the  best)  has  to  be  greater  than  .90  for  both  the  rules 
and  is  actually  equal  to  .998  for  the  rule  and  .997  for  the  rule  R^.  The 
expected  average  rank  of  the  selected  subset  and  the  expected  proportion  of  the 
populations  selected  in  the  subset  for  the  rule  Rg  are  1.86  and  .441,  respectively. 
The  corresponding  values  for  the  rule  R-j  are  1.995  and  .489. 

It  appears  from  these  tables  that  the  rule  Rg  based  on  sample  means  is 
superior  to  the  rule  R^  based  on  sample  medians,  and  also,  as  expected,  the 
performance  of  Rg  relative  to  R-j  improves  as  sample  size  is  increased. 

Remarks: 

(1)  For  fixed  P*  and  k 

P(1 ,k,P*,6,n|R^)  + in  e/n 

P(k,k,P*,6,n|R^)  + 

(2)  For  fixed  P*,  i,  6 and  n 

P(i ,k,P*,6,n|Ri)  + in  k,  1 £ i £ k 

(3)  For  fixed  k,  P*  and  (i-j)6 

lim  'i'(k,P*,6,n)  = k 
n-x» 

(ii)  It  follows  from  (5.3.4)  and  (5.3.5)  that 

k " 

(A)  y(k,P*,6,n|R,)  > I 1/ H(u|l,k,P*,6,n)*'"(u)[l-«(u)]%(u)du 

(m! ) a i = l -<» 


where  the  function  H is  as  defined  in  (5.3.4). 
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5,4.  A Selection  Rule  Based  on  Medians  from  Large  Samples 

Let  f(x)  be  the  pdf  of  a continuous  random  variable  X and  let  e be  its 

unique  median.  Then  the  distribution  of  X,  the  sample  median  based  on  n = 2iiH-l 

(m  > 1)  independent  observations  on  X,  is  known  to  be  asymptotically 
2 -1 

N(9,[4(f(6))  (2nH-l)]  ) provided  certain  regularity  conditions  on  f{x)  are  met 

[see  Cramer  (1946)].  This  result  will  be  used  to  investigate  procedures  based 
on  medians  from  large  samples. 

Using  the  notations  of  Section  2.0,  we  see  that,  for  large  samples  from 

2 

normal  population  N(9^. ,o  ) i = l,...,k,  X^  is  approximately  distributed  as 
2 

N(e^. ,Tra  /2(2nH-l)).  For  the  problem  of  selecting  a subset  containing  the  largest 
mean  er|.T,  we  proposed  the  following  rule: 


R2:  Select  ir^  iff  X^  - ^[k] 


6201/11 

^(2m+T) 


(5.4.1) 


where  the  constant  d^  ^ 0 is  given  by 


/ ♦''■^u+d2)4.(u)du  = P*. 


(5.4.2) 


Tables  of  values  of  d2  satisfying  (5.4.2)  are  available  in  Bechhofer  (1954) 
for  k = 1(1)10  and  in  Gupta  (1963)  for  k = 1(1)51  [see  Table  1 of  Gupta  which 
gives  values  of  62//^  for  n = k-1  = 1(1)50]. 

The  expression  for  P(i ,k,P*,6,n|R2) , the  probability  of  selecting  the 
population  associated  with  0^^^  for  the  rule  R2,  is 


P(i,k,P*,6,n|R2)  = / h(uli,k,P*,6,n)(j>(u)du 


(5.4.3) 


where 


h(u|i,k,P*,6,n)  = n ♦(u+d2+[0j-^ 
J ^ 


(5.4.4) 
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5.5  Asymptotic  Relative  Efficiency  (ARE)  of  the  Rule  R2  Relative  to  Gupta's  Rule 

In  this  section  we  compute  the  ARE  of  the  rule  R2  with  respect  to  Gupta's 
rule  In  the  following  two  cases:  (1)  Independent  normal  populations  and  (11) 
Independent  contaminated  normal  population.  We  conslde**  the  two  cases  separately. 

(1)  Independent  Normal  Populations 

Let  iTi,...,iT|^  be  k Independent  normal  populations  with  means  ep...,e|^* 

2 

respectively,  and  a common  known  variance  o . Assume  that  0^  (1  = l,...,k)  are 
In  the  following  slippage  configuration: 


'e  + oA  If  1 = 1g;  A > 0 unknown 


®1  = 


If  1 1, 


The  index  1g  (1  1 1q  1 k)  1s  not  known.  The  population  Is  the  best 
population. 

Our  Interest  is  In  the  relative  performance  of  the  following  two  selection 
procedures: 


R2:  Select  TT^  Iff  X.  ^ 


d20i^ 


R„:  Select  w.  Iff  %.  > - — . 


'1 


1 - "[k] 


The  constants  d and  d2  satisfying  the  basic  P*-cond1t1on  are  both  given 
by  (5.4.2)  and  hence  we  have 


d2  = d (5.5.1) 

Let  S*  be  the  number  of  non-best  populations  In  the  selected  subset. 

Then  small  values  of  S*  are  desirable  and  therefore,  consistent  with  the 
basic  P*-cond1t1on,  we  would  like  to  keep  the  expected  value  of  S*  as  small 
as  possible. 
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It  is  intuitively  clear  that  the  performance  of  any  reasonable  selection 
rule  should  improve  as  the  sample  size  is  increased.  For  a given  € (0  < e < 1), 


let  Np,(€)  be  the  number  of  observations  needed  so  that 


E(S*|R')  = € 


(5.5.2) 


We  will  use  the  following  definition  of  ARE  [see  Barlow  and  Gupta  (1969)]: 


Definition  5.5.1 : 


as 


The  ARE  of  the  rule  R^  relative  to  the  rule  Rg  is  defined 


ARE  (R,,  R;9)  = li™  ^ 


N„  (€) 


b'o"  • 

Now  using  the  definitions  of  N^  (6)  and  Np(6),  it  can  be  shown  that 

“ 1.  p 

/ [4>(u-A/2Np  (€:)/iT+d)-4>(u-A*T^n  '(e)+d)]4>  (u+d)(()(u)du  = 0 


(5.5.5) 


that 


■'2  g 

Using  the  fact  that  $ is  strictly  increasing, it  can  be  seen  from  (5.5.5) 

2 Np  (6) 


and  hence 


Nr(6)  _ 2 


ARE(R,,R)  = lim  ir-rzT  = - = 

(ii)  Independent  Contaminated  Normal  Populations 

Suppose  that  in  the  course  of  sampling  from  population  n.  (i  = l,...,k) 
something  happens  to  the  system  and  gives  rise  to  some  wild  observations. 
Assume  that  the  pdf  of  can  be  written  as 
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f(x,0^)  = a f^(x,0.)  (1-a)f2(x,e^ ) , 0 < a < 1 


(5.5.6) 


This  means  that  the  experimenter  is  sampling  from  a population  with 
pdf  f^(x,0^.)  100u%  of  the  time,  and  from  f2(x,0^)  100(1-a)%  of  the  time. 
The  presence  of  observations  from  f^lx.o.)  is  termed  as  contamination. 
For  our  discussion  we  will  assume  that 


X * 0 

f (x-e  ) = -L  — 1) 


i = 1 ,. . . ,k 


where  b is  a positive  constant.  We  will  also  assume  that  the  means 

are  in  the  same  slippage  configuration  as  in  Case  (i). 

Now,  it  is  known  [see,  for  example,  Rohatgi  (1976)]  that  X.  and  both 

-2  -2 

are  asymptotically  normal,  each  with  mean  0^  and  variances  o and  a , respectively, 
where 


TTO 


{(»  + 


/b 


(5.5.7) 


-2 

o 


^ [a  + (l-a)b] 


(5.5.8) 


For  the  problem  of  selecting  a subset  containing  the  best  population, 
consider  the  following  two  rules: 


R|:  select  iff  ^ Xj-j^j  - d^o 


R*:  select  ti^  iff  ^ - d*  o. 


It  is  easily  seen  that  the  constants  d^  ^ and  d*  > 0 both  satisfy 


the  equation  (5.4.2)  and  hence  d^ 
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Hence 


no 


2 


(l-a)b]. 


ARE(R5,R*)  = lim 
^ €40 


Np*{€) 

^R^ 


= f [u  + (l-a)b]Lu  + 
> <*>  ds  b ♦ 


The  above  result  shows  that  for  m < 1 and  large  values  of  b the  rule 
R^  based  on  sample  medians  is  much  better  than  the  rule  R*  based  on  sample 
means.  In  fact,  it  can  be  seen  from  a result  in  Rohatgi  (1976)  that  the 
ARE(R2,Rg)  is  close  to  1 when  b = 9 and  a = .915,  and  as  the  differences 
b-9  > 0 and/or  .915-a  > 0 increase  the  rule  R^  shows  a -.onsiderable  improvement 
over  R*  in  terms  of  the  ARE. 


5.6  A Test  of  Homogeneity  Based  on  - X^j 

Let  be  k independent  normal  populatioris  with  means  ei,...,0|^, 

2 

respectively,  and  a common  known  variance  o . As  befoi'e,  let  X^  be  the  sample 
median  of  n = 2m+l  (m  ^ 1 ) independent  observations  from  (i  = l,...,k),  and 


Ti] 


X 


[kj 


be  their  ordered  values.  For  the  hypothesis  of  homogeneity 


0,  =. 


we  propose  the  following  test  procedure: 

Reject  Hq  if  R = X|-,^j  - X^j  > y (5.6.1) 

where  the  constant  y is  obtained  from  the  size-condition: 


Pu  (P  > l)  < '»• 
^0 
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Here  <i  is  the  size  of  the  test. 

The  following  theorem  gives  the  constant  y,  and  also  establishes 
a relationship  between  the  test  given  by  (5.6.1)  and  the  selection  procedure 
of  Section  (5.1). 

Theorem  5.6.1 

For  0 < a < 1,  let  y satisfy 

%/h  i *[k]  - ■<)  1 ' - f 

Then 

Pm  (R  > y)  < a. 

0 

Proof:  The  proof  is  similar  to  that  of  Theorem  6.1  of  Gupta  and  Leong 
(1977),  and  hence  omitted. 

^ ^ On  the  Distribution  of  the  Statistics  Associated  with  R^  when  the 
Underlying  Distributions  are  Normal 

Let  X^.  (i  = 0,l,...,k)  be  sample  medians  of  (k+i)  sets  of  n = 2m+l 
(m  1)  independent  observations  from  a standard  normal  distribution. 

Define 

Zi  = X.  - Xq  (i  = l....,k). 

The  random  variables  are  correlated  and  the  distribution  of 

Z = max  Z.  is  needed  in  some  ranking  and  selection  problems.  For 
lj<k  ’ 

standard  double  exponential  populations  the  distribution  of  Z has  been 
computed  by  Gupta  and  Leong  (1977)  for  selected  values  of  k,  n and  a. 

In  this  section  we  give  an  expression  for  the  distribution  function  of  Z 


21 

and  also  provide  a short  table  for  its  upper  percentage  points  for  P*  = <i=  .75, 
.35,  .90,  .95,  .99;  k = 2(1)5,  n = 3(2)11. 

Let  F(‘)  be  the  cdf  of  Z.  Then 

F(z)  = P(Z  - 2)  = P(X.  1 Xg  + 2.  i = l,....k) 

= f LI^(,^^)(m+l,m+l)]''  Ax)[l-'^(x) A(x)dx.  (5.7.1) 

Computations  for  upper  percentage  points  of  F we'^e  done  on  a CDC  6500 
using  Gauss -Hermite  quadrature  based  on  20  nodes  to  perform  the  required 
numerical  integration. 

6 . 0 Logistic  and  Double  Exponential  Distributions 

The  logistic  distribution  is  used  frequently  as  a model  in  economic 
demographic  problems,  and  also  as  a growth  curve.  The  logistic  curve 
although  very  similar  in  shape  to  the  normal  curve,  is  different  in  many 
ways.  It  has  a heavier  tail  than  the  normal,  and  it  does  not  belong  to  the 
Pearsonian  or  Exponential  families  of  distributions  [see  Patel,  Kapadia 
and  Owen  (1976)]. 

The  problem  of  selection  of  a subset  containing  the  largest  location 

parameter  of  several  logistic  populations  has  been  investigated  in  detail 

by  McDonald  (1977).  For  selected  values  of  k,  n and  P*,  values  of  the 

constant  d required  for  the  rule  R have  been  computed.  McDonald  has  also 

compared  the  medians  procedure  to  the  means  procedure  and  has  found  the 

2 

ARE  in  the  logistic  case  to  be  tt  /12.  In  this  sense  the  rule  based  on 
sample  medians  fares  a little  better  in  the  logistic  case,  than  it  does 
in  the  normal  means  problem. 


k 
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TABLE  I 


Upper  100(1-P*)  percentage  points  of  Z = max  (X.-X«)  where 

l<i<k  ■ ^ 

Xg,X^ . ,X|^  are  iid  sample  median  random  variables  i'.  samples 
of  size  n = 2m+l  (m  > 1)  from  the  standard  nonnal  distribution. 


n 

k 

3 

5 

7 

9 

11 

.638 

.511 

.445 

.409 

.393 

.980 

.784 

.676 

.614 

.582 

1 

1.213 

.969 

.832 

.751 

.710 

1.558 

1.245 

1.065 

.956 

.900 

2.208 

1.766 

1.522 

1.398 

1.491 

.959 

.768 

.667 

.610 

.579 

1 .276 

1.019 

.876 

.792 

.744 

2 

1.493 

1.192 

1.019 

.915 

.855 

1 .816 

1.452 

1 .239 

1.105 

1 .030 

2.429 

1.943 

1.676 

1.533 

1.606 

1.125 

.901 

.783 

.715 

.675 

1.432 

1.142 

.980 

.884 

.828 

3 

1.642 

1.310 

1 .117 

1 .000 

.931 

1.854 

1.563 

1 .333 

1.184 

1.099 

2.551 

2.040 

1.761 

1.609 

1.671 

1.235 

.989 

.859 

.784 

.738 

1.536 

1.223 

1.048 

.945 

.883 

4 

1.742 

1.389 

1.182 

1.057 

.982 

2.049 

1.639 

1.396 

1.238 

1.145 

2.634 

2.106 

1.819 

1.661 

1.715 

For  given  k,n  and  P*  = .75  (top),  .85  (second),  .9G  (third), 

.95  (fourth),  .99  (bottom),  the  entries  in  this  table  are  the 
values  of  d which  satisfy 

/ G*^(x+d)  g(x)  dx  = P* 

where  G(-)  is  the  cdf  and  g(-)  the  pdf  of  the  median  of  a sample 
of  size  n from  a standard  normal  population;  n > 3 is  an  odd  integer. 
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TABLE  IIA 

For  the  rule  and  the  configuration  (b,  o + 6o,...,o  + {k-l)6o)  this 


table  gives  the  probability  of  selecting  the  normal  population  with  rank 
i when  the  population  with  mean  e + (i-l)Ao  has  rank  i,  i = 1,2,... ,k;  the 


common 

vari 

2 . 

lance  0 is 

assumed 

to  be 

known. 

P*  = . 

3 

•. . n 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k i 

2 1 

.836 

.749 

.643 

.524 

.404 

.291 

.196 

.123 

.072 

.039 

2 

.944 

.971 

.986 

.994 

.997 

.999 

1.000 

1.000 

1.000 

1.000 

3 1 

.778 

.585 

.359 

.171 

.061 

.016 

.003 

.000 

.000 

.000 

2 

.882 

.828 

.745 

.640 

.521 

.400 

.288 

.194 

.121 

.070 

3 

.959 

.983 

.993 

.997 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

4 1 

.704 

.381 

.118 

.019 

.001 

.000 

.000 

.000 

.000 

.000 

2 

.818 

.648 

.423 

.217 

.084 

.024 

.005 

.001 

.000 

.000 

3 

.907 

.865 

.793 

.697 

.583 

.462 

.344 

.240 

.156 

.094 

4 

.969 

.989 

.996 

.998 

.999 

1.000 

1.000 

1,000 

1.000 

1.000 

5 1 

.612 

.193 

.019 

.001 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.740 

.425 

.142 

.024 

.002 

.000 

.000 

.000 

.000 

.000 

3 

.845 

.688 

.467 

.251 

.102 

.031 

.007 

,001 

.000 

.000 

4 

.923 

.887 

.822 

.733 

.624 

.504 

.384 

.274 

.183 

.113 

5 

.975 

.992 

.997 

.999 

1 .000 

1.000 

1.000 

1.000 

1.000 

1.000 

TABLE  I IIA 

For  the  rule  and  the  configuration  (o,  o + 6a, ...,o  +(k-l )6o) this  table 

gives  the  expected  average  rank  of  the  selected  subset  (top)  and  the  expected 
proportion  of  the  populations  selected  in  the  subset  (bottom)  when  the  normal  2 
population  with  mean  e+(i-l)6a  has  rank  i,  i = l,2,...,k;  the  common  variance  n 
is  assumed  to  be  known. 

P*  = .90,  n = 3 


.5 

1 .0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

r 

2 

1.361 

1.345 

1.307 

1 .256 

1.199 

1.145 

1.098 

1.061 

1.036 

1.019 

.890 

.860 

.814 

.759 

.701 

.645 

.598 

.562 

.536 

.519 

3 

1.806 

1.730 

1.610 

1.481 

1.367 

1.272 

1.193 

1.129 

1.081 

1.047 

.873 

.799 

.699 

.603 

.527 

.472 

.430 

.398 

.374 

.357 

4 

2.234 

2.057 

1.831 

1.634 

1.479 

1.358 

1.261 

1.180 

1.117 

1.071 

.849 

.721 

.582 

.483 

.417 

.372 

.337 

.310 

.289 

.274 

5 

2.639 

2.323 

1.995 

1.745 

1.561 

1.422 

1.311 

1.220 

1.146 

1.091 

.819 

.637 

.489 

.401 

.346 

.307 

.278 

.255 

.237 

.223 
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TABLE  I IB 

For  the  rule  and  the  configuration  (e,  e + 6o,...,0  + (k-l)4o)  this  table 

gives  the  probability  of  selecting  the  normal  population  with  rank  i when  the 
population  with  mean  e + (i-l)6a  has  rank  i,  i = l,2,...,k;  the  common  variance 
2 

o is  assumed  to  be  known. 


P*  = .95.  n = 3 


' ^ ■*  “ 

' - — - — 

5/n 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k i 

2 1 

.910 

.850 

.768 

.665 

.548 

.427 

.312 

.213 

.136 

.080 

2 

.974 

.988 

.995 

.998 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

3 1 

.871 

.718 

.500 

.277 

.118 

.037 

.009 

.001 

.000 

.000 

2 

.938 

.902 

.842 

.758 

.653 

.535 

.414 

.301 

.204 

.129 

3 

.982 

.993 

.998 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

4 1 

.784 

.477 

.173 

.033 

.003 

.000 

.000 

.000 

.000 

.000 

2 

.875 

.732 

.517 

.292 

.126 

.041 

.010 

.002 

.000 

.000 

3 

.940 

.909 

.851 

.770 

.668 

.551 

430 

.315 

.216 

.138 

4 

.982 

.994 

.998 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

5 1 

.742 

.306 

.043 

.002 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.842 

.565 

.235 

.052 

.006 

.000 

.000 

.000 

.000 

.000 

3 

.914 

.798 

.602 

.369 

.176 

.063 

.016 

.003 

.000 

.000 

4 

.961 

.938 

.894 

.828 

.739 

.631 

.512 

.391 

.281 

.188 

5 

.990 

.997 

.999 

1.000 

1.000 

1.000 

1 .000 

1.000 

1.000 

1.000 

For 

the  rule 

R,  and 

TABLE 

the  configuration 

IIIB 

(o,  0 

+ <Sa  , . . . 

.0  + (k- 

1)Ao)  this  table 

gi  ves 

the  expected  average  rank  of  the  selected  subset  (top)  and  the  expected  proportion 
of  the  populations  selected  in  the  subset  (bottom)  when  the  normal  population  with 
mean  o + (i-l)fio  has  rank  i,  i = l,2,...,k;  the  common  variance  is  assumed  to 
be  known. 


P*  = .95,  n = 3 


^)/n 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k 

2 

1.429 

.942 

1.413 

.919 

1.379 

.881 

1.330 

.831 

1.273 

.774 

1.213 

.713 

1.156 

.656 

1.107 

.607 

1.068 

.568 

1.040 

.540 

3 

1.898 

.930 

1.834 

.871 

1.725 

.780 

1 .597 
.678 

1.474 

.590 

1.369 

.524 

1.279 

.474 

1.201 

.434 

1.136 

.401 

1.086 

.376 

4 

2.321 

.895 

2.161 

.778 

1.938 

.635 

1.731 

.523 

1 .565 
.449 

1 .434 
.398 

1.327 

.360 

1.237 

.329 

1.162 

.304 

1.103 

.284 

5 

2.792 

.890 

2.514 

.721 

2.178 

.555 

1.904 

.450 

1.699 

.384 

1.543 

.339 

1.419 

.306 

1.315 

.279 

1.225 

.256 

1.150 

.238 

Y - — 
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TABLE  lie 


For  the  rule  and  the  configuration  (o,  e + 6o,...,0  + (k-l)Ao)  this  table 

gives  the  probability  of  selecting  the  normal  population  with  rank  i when  the 
population  with  mean  e + (i-l)5o  has  rank  i,  i = l,2,...,k;  the  common  variance 


0 

is  assumed 

to  be 

known. 

P*  = .90 

, n = 5 

• . n 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k i 
2 1 

.838 

.754 

.653 

.539 

.422 

.312 

.216 

.139 

.084 

.047 

2 

.943 

.969 

.985 

.993 

.997 

.999 

1.000 

1.000  1 

.000 

1.000 

3 1 

.782 

.596 

.379 

.191 

.073 

.021 

.005 

.001 

.000 

.000 

2 

.882 

.832 

.753 

.652 

.538 

.422 

.311 

.215 

.139 

.084 

3 

.958 

.982 

.993 

.997 

.999 

1.000 

1.000 

1.000  1 

.000 

1.000 

4 1 

.710 

.400 

.134 

.024 

.002 

.000 

.000 

.000 

.000 

.000 

2 

.822 

.657 

.442 

.239 

.098 

.031 

.007 

.001 

.000 

.000 

3 

.907 

.868 

.799 

.708 

.599 

.483 

.368 

.264 

.177 

.110 

4 

.967 

.988 

.995 

.998 

.999 

1.000 

1.000 

1.000  1 

.000 

1.000 

5 1 

.620 

.214 

.025 

.001 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.746 

.444 

.161 

.031 

.003 

.000 

.000 

.000 

.000 

.000 

3 

.848 

.698 

.486 

.274 

.119 

.039 

.010 

.002 

.000 

.000 

4 

.922 

.890 

.828 

.743 

.639 

.525 

408 

.299 

.205 

.131 

5 

.974 

.991 

.996 

.999 

.999 

1.000 

1.000 

1.000  1 

.000 

1.000 

TABLE 

For 

the  rule  R 

^ and 

the  configuration 

(e,o  + 

6o,. . . ,< 

+ (k-l)>So)  this 

table 

gives 

the  expected  average  rank  of  the  selected  subset  (top)  and  the  expected  proportion 
of  the  populations  selected  in  the  subset  (bottom)  when  the  normal  population  with 
mean  o + (i-l)^o  has  rank  i,  i = l,2,...,k;  the  common  variance  is  assumed 
to  be  known. 

P*  = .90.  n = 5 


.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

1.361 

.890 

1.346 

.862 

1.311 

.819 

1.262 

.766 

1.208 

.710 

1.155 

.655 

1.107 

.608 

1.069 

.569 

1 .042 
.542 

1.023 

.523 

1.807 

.874 

1.735 

.803 

1.621 

.708 

1.495 

.613 

1.382 

.537 

1.288 

.481 

1.209 

.439 

1.144 

.405 

1 .093 
.380 

1.056 

.361 

2.236 

.851 

2.068 
. 728 

1 . 850 
. 593 

1.654 

.492 

1.499 

.425 

1.378 

.378 

1.280 

.344 

1.198 

.316 

1.132 

.294 

1.083 

.278 

2.643 

.822 

2.342 

.647 

2.019 

.499 

1.770 

.409 

1.583 

.352 

1.443 

.313 

1.332 

.284 

1.240 

.260 

1.164 

.241 

1.105 

.226 

6 
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TABLE  IIP 

For  the  rule  R-j  and  the  configuration  (o,e  + 6o,..,o  + (k-l)6o)  this  table 

gives  the  probability  of  selecting  the  normal  population  with  rank  i when 
the  population  with  mean  e + (i-l)6o  has  rank  i,  i = l,2,...,k;  the  common 
2 

variance  a is  assumed  to  be  known. 

P*  = .95.  n = 5 


1 

, n 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5,0 

k’ 

2 

i 

1 

.912 

.854 

.776 

.678 

.566 

.449 

.337 

.236 

.155 

.095 

2 

.974 

.987 

.994 

.998 

.999 

1.000 

1.000 

1,000 

1.000 

1.000 

3 

1 

.875 

.729 

.521 

.304 

.137 

.047 

.012 

.002 

.000 

.000 

2 

.939 

.905 

.848 

.769 

.670 

.557 

.441 

.328 

.230 

.150 

3 

.981 

.993 

.997 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

4 

1 

.824 

.543 

.230 

.053 

.006 

.000 

.000 

.000 

.000 

.000 

2 

.901 

.778 

.582 

.359 

.174 

.064 

.018 

.004 

.001 

.000 

3 

.953 

.928 

.881 

.811 

.721 

.615 

.499 

.383 

.277 

.187 

4 

.986 

.995 

.998 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

5 

1 

.753 

.335 

.054 

.003 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.849 

.585 

.264 

.065 

.009 

.001 

.000 

.000 

.000 

.000 

3 

.917 

.809 

.622 

.399 

.202 

.078 

.023 

.005 

.001 

.000 

4 

.962 

.941 

.899 

.837 

.754 

.652 

.539 

.422 

.311 

.215 

5 

.989 

.997 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1 .000 

TABLE  HID 

For  the  rule  R-j  and  the  configuration  (e,  e + 6o,...,e  + (k-l)6o)  this  table 

gives  the  expected  average  rank  of  the  selected  subset  (top)  and  the  expected 
proportion  of  the  populations  selected  in  the  subset  (bottom)  when  the  normal 

population  with  mean  0 + (i-l)6o  has  rank  i,  i = l,2,...,k;  the  common  variance 
2 . 

.1  is  assumed  to  be  known. 

P*  = .95,  n = 5 


/n  .5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0 

k 

2 1.429  1.414  1.382  1.336  1.282  1.224  1.168  1.118  1.078  1.047 

.943  .921  .885  .838  .783  .724  .668  .618  .578  .547 

3 1,899  1.839  1.736  1.613  1.492  1.387  1.298  1.220  1.153  1.100 

.932  .876  .789  .690  .602  .535  .484  .444  .410  .383 


4 2.357  2.216  2.007  1.801  1.629  1.493  1.383  1.289  1.208  1.140 

.916  .811  .673  .556  .475  .420  .379  .347  .319  .297 
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The  problem  of  selection  of  a subset  containing  the  largest  location 
parameter  of  several  double  exponential  populations  has  been  considered  by 
Gupta  and  Leong  (1976),  and  the  selection  rule  proposed  in  Section  3.0  has 
been  investigated  using  both  exact  and  large  sample  distributions  of  the  sample 
median.  We  include  some  of  the  results  of  Gupta  and  Leong  (1976)  for  the  sake 
of  completeness,  and  investigate  the  problem  a little  further  by  numerically 
computing  the  values  of  the  functions  P(i  ,k,P*,6 ,n  | ) and  y(k,P*,6,n  iR-j ) 
defined  in  Section  5.3  when  the  location  parameters  of  the  double  exponential 
populations  are  equally  spaced.  We  also  compute  the  ARE  of  the  rule  R-j  relative 
to  a rule  based  on  sample  means.  It  is  seen  in  this  case  that  the  rule  R-j 
based  on  sample  medians  in  superior  to  the  rule  based  on  sample  means  in  terms 
of  the  ARE. 

6.1  Selection  of  the  Largest  of  Location  Parameters  of  Several  Double 
Expbhe'ntTaT  Fopul'ati  on^sT  ’ 

Let  (T.| ,. . . ,7r|^  be  k independent  double  exponential  populations  with  location 
parameters  0i,...,6|^  respectively.  For  the  problem  of  selecting  a subset 
containing  The  largest  location  parameter,  the  equation  for  the  constant 

d of  the  rule  R is  given  in  Gupta  and  Leong  (1976),  and  can  also  be  obtained 
by  substituting 


in  the  equation  (3. P.3). 
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Since  the  double  exponential  distribution  has  the  MLR  property,  it 
follows  from  Section  3.2  that  the  rule  R-j  has  the  properties  mentioned 
in  that  section.  This  has  also  been  observed  by  Gupta  and  Leong  (1976). 

6.2.  Oji  the  Performance  of  the  Rule  R when  the  Location  Parameters  are 
Equally  Spaced. 

Suppose  the  location  parameters  0i,...,0|^  of  the  k given  double  exponential 
populations  are  equally  spaced,  i.e.,  = o + (i-l)6,  i = l,...,k,  where  6 > 0 

is  a known  constant.  Then  P(i  ,k,P*,(S,n|R),  the  probability  with  which  the  rule 
R^  selects  the  population  associated  with  is  given  by 

P(i,k,P*,6,nlR)  = Lhi(uli,k,P^,6,n)  + hp{u|  i ,k,P*,6,n)]g(u)du  (6.2.1) 

2(m!)^  0 ' 

where 

k 

h,(u|i,k,P*,6,n)  - II  I, 

J”  ' O 6 

j^'i 

k 

h2(uli,k,P*,6,n)  = H I 1 -u-(d-(j-i)6)/2 

j- 1 I - ^ e 
j^i 

and  g(u)  = [(1  - ^ e-")(J  e-")fe-'-'. 

Expressions  for  the  expected  sum  of  ranks 
and  the  expected  average  rank  of  the  populations  retained  in  the  subset  can 
be  obtained  from  (5.3.5)  and  (6.2.1). 

For  selected  values  of  k,  n and  P*,  tables  of  the  constant  d for  double 
exponential  populations  are  given  in  Gupta  and  Leong  (1976).  Using  these  tables, 
we  have  computed  the  values  of  the  function  P( i ,k ,P*,(S,n  | R) , the  expected 
average  rank  and  the  expected  proportion  of  populations  in  the  selected  subset 
for  n ■ 3,  5,  P*  .75,  .90,  .95,  .99  k - 2(1)5  and  .S  = 0.5(0. 5)5.0.  Compulations 
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were  made  on  a COC  6500  using  Gauss  Laguerre  quadrature  based  on  fifteen  nodes 
for  the  numerical  integration.  Tables  are  given  at  the  end  of  this  section. 

For  example  if  n = 3,  P*  = .75,  k = 5 and  6 = 1.5,  the  probability  of  selecting 
the  third  best,  the  second  best,  and  the  best  populations  are  .108,  .794  and  1.000, 


in  that  order.  The  expected  average  rank  in  this  case  is  1.701  and  the  expected 
proportion  of  the  selected  populations  is  .381. 

6 . 3 A comparison  of  rules  based  on  medians  and  means  of  1 arge  samples 

Let  ir^,...,TT|^  be  k independent  double  exponential  populations  with  means 
6^,...,0|^,  respectively  and  common  variance  unity.  Assume  that  for  some 
(unknown)  index  ip  (1  ig  £ k),  -a  = = e,  i = l,...,k,  i f ig,  where 

A > 0 is  an  unknown  constant.  Let  X^.  and  X^.  denote  the  sample  mean  and  sample 
median  of  an  independent  sample  of  size  n = 2m+l  (m^  1)  from  TT^(i  = l,...,k). 
For  the  problem  of  selecting  a subset  containing  the  largest  mean  e.  , the 
following  two  rules  can  be  used: 


R:  Select  iff  >_  - 3//2m+l 


R:  Select  iff  X^  ^ X|-|^j  - d/v 


(6.3.1) 


(6.3.2) 


where  the  constant,  d ^ 0 and  d 0 are  determined  by  the  basic  probability 

requirement.  If  sample  size  n is  sufficiently  large,  X^  and  X.  are  both 

normally  distributed  with  mean  and  variances  l/(2m+l)  and  l/2(2m+l),  respectively. 


It  is  easy  to  see,  as  in  Section  5.5,  that 


3 = d = d,  say. 


(6.3.3) 


In  the  notation  of  Section  5.5  , let  S*  be  the  number  of  non-best  populations 
in  the  selected  subset,  and  let  Np,(L)  be  the  number  of  ooservations  needed  so  that 

E(S*|R')  = c. 


Following  the  method  of  Section  5.5  we  can  see  that 


N^(6)  = 2Np(6) 

and  hence  we  have 


N.(e) 

ARE(R.R)  = lim  = 2. 

€+0 
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TABLE  IV  A 

For  the  rule  R and  the  configuration  (0,0  + *,...,0  + (k-l)6)  this  table 
gives  the  probability  of  selecting  the  double  exponential  population  with 
rank  i when  the  population  with  mean  0 + (i-l)6  has  rank  i (i  = l,...,k). 

P*  = .90.  n = 3 


6 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k i 

2 1 

.879 

.705 

.429 

.195 

.073 

.025 

.008 

.002 

.001 

.000 

2 

.986 

.996 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

3 1 

.814 

.335 

.050 

.005 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.939 

.840 

.634 

.354 

.152 

.055 

.018 

.006 

.002 

.000 

3 

.993 

.998 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

4 1 

.687 

.067 

.002 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.874 

.450 

.079 

.008 

.001 

.000 

.000 

.000 

.000 

.000 

3 

.960 

.894 

.737 

.468 

.219 

.084 

.030 

.009 

.003 

.001 

4 

.995 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

5 1 

.483 

.007 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.753 

.095 

.003 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

3 

.906 

.538 

.108 

.012 

.001 

.000 

.000 

.000 

.000 

.000 

4 

.971 

.922 

.794 

.553 

.280 

.114 

.041 

.013 

.004 

.001 

5 

.997 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

TABLE  V A 

For  the  rule  R and  the  configuration  (o,e  + 6,...,o  + (k-l)6)  this  table  gives 
the  expected  average  rank  of  the  selected  subset  (top)  and  the  expected  proportion 
of  the  populations  selected  in  the  subset  (bottom)  when  the  double  exponential 
population  with  mean  0 + (i-l)'S  has  rank  i (i  = l,...,k). 

P*  = .90,  n = 3 


0.5 

1.0 

1 .5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k 

2 

1.425 

.932 

1.348 

.851 

1.213 

.714 

1.097 

.598 

1 .036 
.537 

1 .013 
.513 

1.004 

.504 

1.001 

.501 

1.000 

.500 

1.000 

.500 

3 

1.890 

.915 

1.670 

.724 

1.439 

.561 

1.237 

.453 

1.102 

.384 

1.037 

.352 

1.012 

.339 

1.004 

.335 

1.001 

.334 

1.000 

.333 

4 

2.324 

.879 

1.911 

.603 

1.592 

.454 

1.355 

.369 

1.165 

.305 

1.063 

.271 

1.022 

.257 

1.007 

.252 

1.002 

.251 

1.001 

.250 

5 

2.715 

.822 

2.099 

.512 

1.701 

.381 

1.450 

.313 

1.225 

.256 

1.091 

.223 

’.033 

.208 

1.010 

.203 

1.003 

.201 

1.001 

.200 
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TABLE  IV  B 

For  the  rule  R and  the  configuration  (0,6  +6,. . , ,o+(k-l /6)  this  table  gives  the 
probability  of  selecting  the  double  exponential  population  with  rank  i when  the 
population  with  mean  0 + (i-l)6  has  rank  i (i  = l,...,k). 


P*  = .95.  n = 3 


5 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k i 

2 1 

,954 

.872 

.692 

.414 

.187 

.069 

.024 

.007 

.002 

.001 

2 

.995 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

3 1 

.927 

.604 

.139 

.016 

.001 

.000 

.000 

.000 

.000 

.000 

2 

.978 

.939 

.832 

.618 

.339 

.144 

.052 

.017 

.005 

.002 

3 

.998 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

4 1 

.863 

.188 

.006 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.952 

.712 

.203 

.026 

.002 

.000 

.000 

.000 

.000 

.000 

3 

.986 

.960 

.888 

.723 

.451 

.209 

.079 

.028 

.009 

.003 

4 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

5 1 

.739 

.027 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.898 

.246 

.009 

.000 

.000 

.000 

000 

.000 

.000 

.000 

3 

.965 

.774 

.261 

,037 

.004 

.000 

.000 

.000 

.000 

.000 

4 

.990 

.971 

.917 

.784 

.536 

.267 

.107 

.038 

.012 

.004 

5 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

TABLE  V B 


For  the  rule  R and  the  configuration  (0,0  + + {k-l)6)  this  table  gives 

the  expected  average  rank  of  the  selected  subset  (top)  and  the  expected  proportion 
of  the  populations  selected  in  the  subset  (bottom)  when  the  double  exponential 
population  with  mean  e + (i-l)6  has  rank  i (i  = l,...,k). 

P*  = 95,  n = 3 


r 

0.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k 

2 

1.472 

1.435 

1.346 

1.207 

1.093 

1.035 

1.012 

1.004 

1.001 

1.000 

.975 

.935 

.846 

.707 

.593 

.535 

.512 

.504 

.501 

.500 

3 

1.959 

1.827 

1.601 

1.418 

1.226 

1.096 

1.035 

1.011 

1.004 

1.001 

.968 

.847 

.657 

.545 

.447 

.381 

.351 

.339 

.335 

.334 

4 

2.430 

2.123 

1.769 

1.556 

1.339 

1.157 

1.059 

1.021 

1.006 

1.002 

.950 

.715 

.524 

.437 

.363 

.302 

.270 

.257 

.252 

.251 

5 

2.877 

2.345 

1.894 

1.649 

1.431 

1.214 

1.086 

1.031 

1.010 

1.003 

.918 

.604 

.438 

.364 

.308 

.253 

.221 

.208 

.202 

.201 

3 


TABLE  IV  C 

For  the  rule  R and  the  configuration  (e,  e + 6,...,e  + (k-l)6)  this  table  gives 
the  probability  of  selecting  the  double  exponential  population  with  rank  i when 
the  pf.pulation  with  mean  e + (i-l).'i  has  rank  i (i  = l,...,k). 

P*  = .90,  n =5 


.5 

1 .0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

2 

1 

1 

.833 

.524 

.194 

.050 

.010 

.002 

.000 

.000 

.000 

.000 

2 

.991 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1 

.000 

1.000 

3 

1 

.680 

.100 

.004 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.914 

.708 

.350 

.108 

.024 

.005 

.001 

.000 

.000 

.000 

3 

.996 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1 

.000 

1.000 

4 

1 

.412 

.006 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.773 

.148 

.007 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

3 

.945 

.794 

.458 

.157 

.038 

.008 

.001 

.000 

.000 

.000 

4 

.998 

1.000 

1.000 

1 .000 

1.000 

1.000 

1.000 

1.000 

1 

.000 

1.000 

5 

1 

. 157 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.495 

.008 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

3 

.824 

.194 

.010 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

4 

.960 

.841 

.538 

.203 

.053 

.011 

.002 

.000 

.000 

.000 

5 

.998 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1 

.000 

1.000 

TABLE  V C 

For  the  rule  R and  the  configuration  (o,0  + + (k-l)6)  this  table  gives  the 

expected  average  rank  of  the  selected  subset  (top)  and  the  expected  proportion  of 
the  populations  selected  in  the  subset  (bottom)  when  the  double  exponential  population 
with  mean  o + (i-l)(S  has  rank  i (i  = l,...,k). 

P*  = .90,  n = 5 


i 

0.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k 

2 

1.408 

1.260 

1.097 

1 .025 

1.005 

1.001 

1.000 

1.000 

1.000 

1.000 

.912 

.761 

.597 

.525 

.505 

.501 

.500 

.500 

.500 

.500 

3 

1.832 

1.505 

1.235 

1.072 

1.016 

1.003 

1.001 

1.000 

1.000 

1.000 

.863 

.602 

.451 

.369 

.341 

.335 

.334 

.333 

.333 

.333 

4 

2.196 

1.671 

1.347 

1.118 

1.029 

1.006 

1.001 

1.000 

1.000 

1.000 

.782 

.487 

.366 

.289 

.260 

.252 

.250 

.250 

.250 

.250 

5 

2.490 

1.792 

1.436 

1.162 

1.043 

1.009 

1.002 

1.000 

1.000 

1.000 

.687 

.409 

.310 

.241 

.211 

.202 

.200 

.200 

.200 

.200 

TABLE  IV  D 

For  the  rule  R and  the  configuration  (0,0  + +(k-l)6)  this  table  gives  the 

probability  of  selecting  the  double  exponential  population  with  rank  i when  the 
population  with  mean  e + (i-l)(S  has  rank  i (i  = l,..,,k). 

P*  = .95,  n = 5 


6 

.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

k 

2 

T” 

1 

.933 

.756 

.405 

.132 

.031 

.006 

.001 

.000 

.000 

.000 

.997 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1 .000 

1.000 

1.000 

3 

1 

.857 

.239 

.013 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.969 

.871 

.600 

.248 

.069 

.014 

.003 

.000 

.000 

.000 

3 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

4 

1 

.668 

.018 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.902 

.333 

.022 

.001 

.000 

.000 

.000 

.000 

.000 

.000 

3 

.981 

.912 

.699 

.341 

.104 

.023 

.005 

.001 

.000 

.000 

4 

.999 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

5 

1 

.365 

.001 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2 

.736 

.026 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

3 

.927 

.406 

.031 

.001 

.000 

.000 

.000 

.000 

.000 

.000 

4 

.986 

.935 

.763 

.413 

.135 

.032 

.006 

.001 

.000 

.000 

5 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1 .000 

1.000 

TABLE  V D 

For  the  rule  R and  the  configuration  (e,0  + 6,...,0  + (k-l)6)  this  table  gives 
the  expected  average  rank  of  the  selected  subset  (top)  and  the  expected  proportion 
of  the  populations  selected  in  the  subset  (bottom)  when  the  double  exponential 
population  with  mean  e + (i-l)6  has  rank  i (i  = l,...,k). 

P*  = .95,  n = 5 


u 

0.5 

1.0 

1.5 

2.0 

2.5 

3.0 

3.5 

4.0 

4.5 

5.0 

2 

1.464 

1.378 

1.202 

1.066 

1.015 

1.003 

1.001 

1.000 

1.000 

1.000 

.965 

.878 

.702 

.566 

.515 

.503 

.501 

.500 

.500 

.500 

3 

1.931 

1.660 

1 .404 

1.165 

1.046 

1.010 

1.002 

1 .000 

1.000 

1.000 

.942 

.703 

.538 

.416 

.356 

.338 

.334 

.333 

.333 

.333 

4 

2.353 

1.855 

1.535 

1.256 

1.078 

1.017 

1.003 

1.001 

1.000 

1.000 

.887 

.566 

.430 

.335 

.276 

.256 

.251 

.250 

.250 

.250 

5 

2.712 

2.002 

1 .629 

1.331 

1.108 

1.025 

1.005 

1.001 

1 . 000 

1.000 

.803 

.474 

.359 

.283 

.227 

.206 

.201 

.200 

.200 

.200 

1 


1 
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